Deriving continous grounded meaning representations from referentially structured multimodal contexts

نویسندگان

  • Sina Zarrieß
  • David Schlangen
چکیده

Corpora of referring expressions paired with their visual referents are a good source for learning word meanings directly grounded in visual representations. Here, we explore additional ways of extracting from them word representations linked to multi-modal context: through expressions that refer to the same object, and through expressions that refer to different objects in the same scene. We show that continuous meaning representations derived from these contexts capture complementary aspects of similarity, even if not outperforming textual embeddings trained on very large amounts of raw text when tested on standard similarity benchmarks. We propose a new task for evaluating grounded meaning representations—detection of potentially co-referential phrases—and show that it requires precise denotational representations of attribute meanings, which our method provides.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Significance of Multimodality/Multiliteracies in Iranian EFL Learners’ Meaning- Making Process

The main objective of this study was to investigate how Iranian EFL learners used their literacy practices and multimodal resources to mediate interpretation and representation of an advertisement text and construct their understanding of it. Fifteen female adolescents at an intermediate level of proficiency read the "مبلمان برلیان" (“Brelian Furniture”) advertisement text and re-created their ...

متن کامل

Learning Visual Attributes from Image and Text

Visual attributes are the words describing appearance properties of an object. For example, one might use gray or brown and furry to describe a cat. Visual attributes have been studied in the computer vision community [4], where the main focus has been in the automatic recognition of attributes from an image. Visual attributes have an interesting property in that they are linguistic entities an...

متن کامل

Peer-Assessment and Student-Driven Negotiation of Meaning: Two Ingredients for Creating Social Presence in Online EFL Social Contexts

With the current availability of state-of-the-art technology, particularly the Internet, people have expanded their channels of communication. This has similarly led to many people utilizing technology to learn second/foreign languages. Nevertheless, many current computer-assisted language learning (CALL) programs still appear to be lacking in interactivity and what is termed social presence, w...

متن کامل

Semiotic schemas: A framework for grounding language in action and perception

A theoretical framework for grounding language is introduced that provides a computational path from sensing and motor action to words and speech acts. The approach combines concepts from semiotics and schema theory to develop a holistic approach to linguistic meaning. Schemas serve as structured beliefs that are grounded in an agent’s physical environment through a causal-predictive cycle of a...

متن کامل

Grounded spoken language acquisition: experiments in word learning

| Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-speci ed, and have meaning only when interpreted by humans. We are interested in developing computational syste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017